Active Learning for Data Streams Under Concept Drift and Concept Evolution

نویسندگان

  • Saad Mohamad
  • Moamar Sayed Mouchaweh
  • Abdelhamid Bouchachia
چکیده

Data streams classification is an important problem however, poses many challenges. Since the length of the data is theoretically infinite, it is impractical to store and process all the historical data. Data streams also experience change of its underlying distribution (concept drift), thus the classifier must adapt. Another challenge of data stream classification is the possible emergence and disappearance of classes which is known as (concept evolution) problem. On the top of these challenges, acquiring labels with such large data is expensive. In this paper, we propose a stream-based active learning (AL) strategy (SAL) that handles the aforementioned challenges. SAL aims at querying the labels of samples which results in optimizing the expected future error. It handles concept drift and concept evolution by adapting to the change in the stream. Furthermore, as a part of the error reduction process, SAL handles the sampling bias problem and queries the samples that caused the change i.e., drifted samples or samples coming from new classes. To tackle the lack of prior knowledge about the streaming data, non-parametric Bayesian modelling is adopted namely the two representations of Dirichlet process; Dirichlet mixture models and stick breaking process. Empirical results obtained on real-world benchmarks show the high performance of the proposed SAL method compared to the state-of-the-art methods.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Detecting Concept Drift in Data Stream Using Semi-Supervised Classification

Data stream is a sequence of data generated from various information sources at a high speed and high volume. Classifying data streams faces the three challenges of unlimited length, online processing, and concept drift. In related research, to meet the challenge of unlimited stream length, commonly the stream is divided into fixed size windows or gradual forgetting is used. Concept drift refer...

متن کامل

Categorizing Concepts for Detecting Drifts in Stream

Mining evolving data streams for concept drifts has gained importance in applications like customer behavior analysis, network intrusion detection, credit card fraud detection. Several approaches have been proposed for detection of concept drifts in the context of supervised learning in data streams. Recently, researchers have been looking into the problem of identifying concept drifts in unlab...

متن کامل

MOA Concept Drift Active Learning Strategies for Streaming Data

We present a framework for active learning on evolving data streams, as an extension to the MOA system. In learning to classify streaming data, obtaining the true labels may require major effort and may incur excessive cost. Active learning focuses on learning an accurate model with as few labels as possible. Streaming data poses additional challenges for active learning, since the data distrib...

متن کامل

Dynamic Cost-sensitive Ensemble Classification based on Extreme Learning Machine for Mining Imbalanced Massive Data Streams

In order to lower the classification cost and improve the performance of the classifier, this paper proposes the approach of the dynamic cost-sensitive ensemble classification based on extreme learning machine for imbalanced massive data streams (DCECIMDS). Firstly, this paper gives the method of concept drifts detection by extracting the attributive characters of imbalanced massive data stream...

متن کامل

Learning from Data Streams with Concept Drift

Increasing access to incredibly large, nonstationary datasets and corresponding demands to analyse these data has led to the development of new online algorithms for performing machine learning on data streams. An important feature of real-world data streams is " concept drift, " whereby the distributions underlying the data can change arbitrarily over time. The presence of concept drift in a d...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016